Search CORE

7 research outputs found

Integrating and visualising primary data from prospective and legacy taxonomic literature.

Author: Agosti Donat
Catapano Terry
Georgiev Teodor
King David
Miller Jeremy A.
Patterson David
Penev Lyubomir
Pereira Serrano
Sautter Guido
Sierra Soraya
Vos Rutger Aldo
Publication venue
Publication date: 01/01/2015
Field of study

Specimen data in taxonomic literature are among the highest quality primary biodiversity data. Innovative cybertaxonomic journals are using workflows that maintain data structure and disseminate electronic content to aggregators and other users; such structure is lost in traditional taxonomic publishing. Legacy taxonomic literature is a vast repository of knowledge about biodiversity. Currently, access to that resource is cumbersome, especially for non-specialist data consumers. Markup is a mechanism that makes this content more accessible, and is especially suited to machine analysis. Fine-grained XML (Extensible Markup Language) markup was applied to all (37) open-access articles published in the journal Zootaxa containing treatments on spiders (Order: Araneae). The markup approach was optimized to extract primary specimen data from legacy publications. These data were combined with data from articles containing treatments on spiders published in Biodiversity Data Journal where XML structure is part of the routine publication process. A series of charts was developed to visualize the content of specimen data in XML-tagged taxonomic treatments, either singly or in aggregate. The data can be filtered by several fields (including journal, taxon, institutional collection, collecting country, collector, author, article and treatment) to query particular aspects of the data. We demonstrate here that XML markup using GoldenGATE can address the challenge presented by unstructured legacy data, can extract structured primary biodiversity data which can be aggregated with and jointly queried with data from other Darwin Core-compatible sources, and show how visualization of these data can communicate key information contained in biodiversity literature. We complement recent studies on aspects of biodiversity knowledge using XML structured data to explore 1) the time lag between species discovery and description, and 2) the prevalence of rarity in species descriptions

ZENODO

Directory of Open Access Journals

Open Research Online (The Open University)

PubMed Central

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Enriched biodiversity data as a resource and service

Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts

Shared Research Repository

ZENODO

Directory of Open Access Journals

Open Research Online (The Open University)

PubMed Central

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Enlighten

The University of Manchester - Institutional Repository

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Community engagement: The ‘last mile’ challenge for European research e-infrastructures

Author: Arvanitidis Christos
Belbin Lee
Berendsohn Walter G.
Damgaard Christian
Groom Quentin John
Güntsch Anton
Hagedorn Gregor
Hardisty Alex
Hobern Donald
Koureas Dimitrios
Marcer Arnald
Mietchen Daniel
Morse David R.
Obst Matthias
Penev Lyubomir
Pettersson Lars B.
Sierra Soraya
Smith Vincent Stuart
Vos Rutger Aldo
Publication venue
Publication date: 01/01/2016
Field of study

Europe is building its Open Science Cloud; a set of robust and interoperable e-infrastructures with the capacity to provide data and computational solutions through cloud-based services. The development and sustainable operation of such e-infrastructures are at the forefront of European funding priorities. The research community, however, is still reluctant to engage at the scale required to signal a Europe-wide change in the mode of operation of scientific practices. The striking differences in uptake rates between researchers from different scientific domains indicate that communities do not equally share the benefits of the above European investments. We highlight the need to support research communities in organically engaging with the European Open Science Cloud through the development of trustworthy and interoperable Virtual Research Environments. These domain-specific solutions can support communities in gradually bridging technical and socio-cultural gaps between traditional and open digital science practice, better diffusing the benefits of European e-infrastructures

Natural History Museum Repository

Lund University Publications

Online Research @ Cardiff

ZENODO

Directory of Open Access Journals

Open Research Online (The Open University)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints

Unifying European Biodiversity Informatics (BioUnify)

Author: Agosti Donat
Arvanitidis Christos
Bogatencov Peter
Buttigieg Pier Luigi
de Jong Yde
Gkoutos Georgios
Groom Quentin John
Hardisty Alex
Horvath Ferenc
Kliment Tomas
Koureas Dimitrios
Kõljalg Urmas
Manakos Ioannis
Marcer Arnald
Marhold Karol
Mergen Patricia
Morse David
Penev Lyubomir
Pettersson Lars B.
Smith Vincent Stuart
Svenning Jens-Christian
van de Putte Anton
Vos Rutger Aldo
Publication venue
Publication date: 01/01/2016
Field of study

In order to preserve the variety of life on Earth, we must understand it better. Biodiversity research is at a pivotal point with research projects generating data at an ever increasing rate. Structuring, aggregating, linking and processing these data in a meaningful way is a major challenge. The systematic application of information management and engineering technologies in the study of biodiversity (biodiversity informatics) help transform data to knowledge. However, concerted action is required to be taken by existing e-infrastructures to develop and adopt common standards, provisions for interoperability and avoid overlapping in functionality. This would result in the unification of the currently fragmented landscape that restricts European biodiversity research from reaching its full potential. The overarching goal of this COST Action is to coordinate existing research and capacity building efforts, through a bottom-up trans-disciplinary approach, by unifying biodiversity informatics communities across Europe in order to support the long-term vision of modelling biodiversity on earth. BioUnify will: 1. specify technical requirements, evaluate and improve models for efficient data and workflow storage, sharing and re-use, within and between different biodiversity communities; 2. mobilise taxonomic, ecological, genomic and biomonitoring data generated and curated by natural history collections, research networks and remote sensing sources in Europe; 3. leverage results of ongoing biodiversity informatics projects by identifying and developing functional synergies on individual, group and project level; 4. raise technical awareness and transfer skills between biodiversity researchers and information technologists; 5. formulate a viable roadmap for achieving the long-term goals for European biodiversity informatics, which ensures alignment with global activities and translates into efficient biodiversity policy

Natural History Museum Repository

ZENODO

University of Birmingham Research Portal

Directory of Open Access Journals

Electronic Publication Information Center

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Lund University Publications

Online Research @ Cardiff

Open Research Online (The Open University)

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

Repository of the Academy's Library

ARPHA Preprints

BioVeL : a virtual laboratory for data analysis and modelling in biodiversity science and ecology

Author: Bacall Finn
Balcázar-Vargas Maria Paula
Balech Bachir
Barcza Zoltán
Beard Niall
Bourlat Sarah J.
De La Hidalga Abraham Nieva
Dobor Laura
Donvito Giacinto
Ernst Vera Hernández
Fellows Donal
Fernandez Francisco Quevedo
Ferreira Nuno
Fetyukova Yuliya
Fosso Bruno
Giddy Jonathan
Giovanni Renato De
Goble Carole
Guerra Antonio Fernandez
Güntsch Anton
Haines Robert
Hardisty Alex R.
Hettling Hannes
Hidy Dóra
Horváth Ferenc
Information Management
ITS Research IT
Ittzés Péter
Ittzés Péter
Jones Andrew
Jong Yde
Kottmann Renzo
Kulawik Robert
Leidenberger Sonja
Leo Francesca
Lyytikäinen-Saarenmaa Päivi
Mathew Cherian
Morrison Norman
Nenadic Aleksandra
Obst Matthias
Oostermeijer Gerard
Paymal Elisabeth
Pesole Graziano
Pinto Salvatore
Poigné Axel
Saarenmaa Hannu
Santamaria Monica
Sipos Gergely
Sylla Karl Heinz
Tähtinen Marko
Vicario Saverio
Vos Rutger Aldo
Williams Alan
Yilmaz Pelin
Publication venue
Publication date: 01/01/2016
Field of study

Background: Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. Results: BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on- line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. Conclusions: Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.Peer reviewe

Online Research @ Cardiff

Springer - Publisher Connector

Fraunhofer-ePrints

Archivio istituzionale della ricerca - Università di Bari

PubMed Central

The University of Manchester - Institutional Repository

Helsingin yliopiston digitaalinen arkisto

MPG.PuRe

UvA-DARE

International Migration, Integration and Social Cohesion online publications

FigShare

Inferring large phylogenies: The big tree problem

Author: Vos Rutger Aldo
Publication venue
Publication date: 01/01/2006
Field of study

Phylogenetic trees are graph-like structures whose topology describes the inferred pattern of relationships among a set of biological entities, such as species or DNA sequences. Inference of these phylogenies typically involves evaluating large numbers of possible solutions and choosing the optimal topology, or set of topologies, from among all evaluated solutions. Such analyses are computationally intensive, especially when the pattern of relationships among a large number of entities is being sought. This thesis introduces two novel algorithms for the inference of large trees; one is applicable to the likelihood framework, the other to the Bayesian framework. Both approaches rely on the notion of a multi-modal tree ‘landscape’ through which inferential algorithms traverse. Using sampling techniques, the landscape can be perturbed sequentially, such that local optima can be evaded. The algorithms find good solutions in reasonable time, as demonstrated using real and simulated data sets. An example of large phylogeny inference is presented in the form of a novel estimate of Primate phylogeny – the largest estimate for this Order to date. The phylogeny is based on previously published smaller phylogenies, and hence serves as a summary of the present state of Primate phylogeny. In addition to this ‘supertree’s’ topology, composite estimates of divergence are provided also. These estimates are based on multiple, clock-like genes combined using a novel approach presented here. Handling sets of trees and sequences poses practical problems in terms of conversion of data and the interoperation between computer programs. The thesis therefore concludes with a chapter discussing suitable data structures and programming patterns for phylogenetics. The appendix discusses an implementation of some of these concepts in an object-oriented application programming interface

Simon Fraser University Institutional Repository

Emerging semantics to link phenotype and environment

Author: Andrew R. Deans
Anne E. Thessen
Carolyn J. Lawrence-Dill
Chelsea D. Specht
Christopher J. Mungall
Daniel E. Bunker
Eva Huala
Guanyang Zhang
Jeffrey W. White
Lars Vogt
Laurel D. Cooper
Martín J. Ramírez
Nico M. Franz
Pankaj Jaiswal
Paula M. Mabee
Peter E. Midford
Pier Luigi Buttigieg
Ramona L. Walls
Rutger Aldo Vos
Sami Domisch
Suzanna E. Lewis
Wasila M. Dahdul
Publication venue: PeerJ Inc.
Publication date: 01/01/2015
Field of study

Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments

CONICET Digital

Directory of Open Access Journals

PubMed Central

Electronic Publication Information Center

eScholarship - University of California